Decoding by Dynamic Chunking for Statistical Machine Translation

نویسندگان

Sirvan Yahyaei

Christof Monz

چکیده

In this paper we present an extension of a phrase-based decoder that dynamically chunks, reorders, and applies phrase translations in tandem. A maximum entropy classifier is trained based on the word alignments to find the best positions to chunk the source sentence. No language specific or syntactic information is used to build the chunking classifier. Words inside the chunks are moved together to enable the decoder to make longdistance re-orderings to capture the word order differences between languages with different sentence structures. To keep the search space manageable, phrases inside the chunks are monotonically translated, thus by eliminating the unnecessary local re-orderings, it is possible to perform long-distance re-orderings beyond the common fixed distortion limit. Experiments on German to English translation are reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction

We report a wide range of comparative experiments establishing for the first time contrastive foundations for a completely unsupervised approach to bilingual grammar induction that is cognitively oriented toward early category formation and phrasal chunking in the bootstrapping process up the expressiveness hierarchy from finite-state to linear to inversion transduction grammars. We show a cons...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Phrase-based Machine Translation using Multiple Preordering Candidates

In this paper, we propose a new decoding method for phrase-based statistical machine translation which directly uses multiple preordering candidates as a graph structure. Compared with previous phrase-based decoding methods, our method is based on a simple left-to-right dynamic programming in which no decoding-time reordering is performed. As a result, its runtime is very fast and implementing ...

متن کامل

Efficient Decoding for Statistical Machine Translation with a Fully Expanded WFST Model

This paper proposes a novel method to compile statistical models for machine translation to achieve efficient decoding. In our method, each statistical submodel is represented by a weighted finite-state transducer (WFST), and all of the submodels are expanded into a composition model beforehand. Furthermore, the ambiguity of the composition model is reduced by the statistics of hypotheses while...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Decoding by Dynamic Chunking for Statistical Machine Translation

نویسندگان

چکیده

منابع مشابه

From Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction

A Hybrid Machine Translation System Based on a Monotone Decoder

تعیین مرز و نوع عبارات نحوی در متون فارسی

Phrase-based Machine Translation using Multiple Preordering Candidates

Efficient Decoding for Statistical Machine Translation with a Fully Expanded WFST Model

عنوان ژورنال:

اشتراک گذاری